This document includes all information referenced in the main paper. We show full details of the process of analysis and the steps taken to ensure the statistical viability of those reported, documenting what the pre-registered models were and providing justification in cases where the final models differed. Please see section ‘variations of statistical models tested’ for more details.

As a supplementary analysis, we also show how differences between the adverts impacted key outcome variables. However, keep in mind the mixed-effect models used tested if the main effects were generalisable beyond specific stimuli-level differences. The advert-level results are provided here for context but do not affect the generalisability or statistical robustness of our primary conclusions.

Model Fit Considerations

For hypotheses 1-3 we used mixed-effect modelling techniques. It is essential for random effect model structures to be theoretically motivated and justified (Brown, 2021). During the pre-registration stage, it was reasoned the inclusions of two separate random intercept terms set for participants (1,322 grouping levels) and each material (four grouping levels) were the most suitable due to how each participant would approach the materials with a different evaluative baseline, and in turn each piece of material differed in its message and format creating its own baseline through which it would be judged from. However, during the process of data analysis, some key considerations came to light which concerned best practice for multi-level models, a topic debated across the literature.

For example, it is recognised that not accounting for correlations within the random effects increases the risk of a type 1 error when estimated fixed effects (Matuschek et al., 2017). For this reason, Barr et al. (2013) recommend always opting for the ‘maximal’ model (with the highest complexity) that can be theoretically justified. However, increasing model complexity also reduces statistical power, requiring consideration as to whether it can be deemed a necessary trade off (Matuschek et al., 2017). Matuschek et al. (2017) recommend using estimators of model fit, such as the Likelihood Ratio Test (LRT) and the Akaike Information Criterion (AIC) to check if increasing complexity substantially improves model fit.

This point is relevant because, during the analysis, other theoretically viable model structures were identified that supported a more complex random effect structure. For example, adding a random slope term would account for whether the effect of a digital imprint varied depending on the unique characteristics of the material—i.e., if the digital imprint was effective on some materials but not others. Consequently, various theoretically justified models were tested to find a random effects structure that balanced complexity with good model fit. Following the recommendations in Matuschek et al. (2017), to assess model suitability we used likelihood ratio tests (LRT) to compare nested models (e.g., the addition of interaction terms between fixed effects) and AIC to compare models with different random effect structures. Models with lower AIC values indicate a better relative fit for the data (Cavanaugh and Neath, 2019), helping determine if additional complexity improves (or weakens) model performance (Matuschek et al., 2017). Out of all model variants tested, the model with the lowest AIC - if it would converge - was used as a baseline. Those with an AIC within two units of the lowest were considered viable, while those between two and ten units of the lowest were deemed less supported. Models with AIC values ten units or more above the best model were considered unsupported as a model structure (for an explanation see Cavanaugh and Neath, 2019). Full details of alternative models are provided, and deviations from pre-registered models are transparently discussed.

Criteria for determining model support (Cavanaugh & Neath, 2019):

  • Model with the lowest AIC = best model fit
  • Within 0-2 of lowest model = viable model and should still be considered if it better reflects the theoretical motivations of the analysis
  • Within 2-10 of the lowest model = low support for the model
  • 10+ of the lowest model = no support for this model

Preparing the dataset

Libraries used

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
library(stringr)
library(purrr)

Reading in data and exclusions

data <- read.csv("rawdata_with_wranglecode/main_data.csv")

#change name of row number identifier

data <- data %>% 
  mutate(id = row_number())

#removing missing data rows from dataset (12 in total) for participants who did not complete the survey

rows_with_blank_in_name <- grepl("^\\s*$", data$EPE_5)
df_with_blanks_in_name <- data[rows_with_blank_in_name, ]

data <- data[!rows_with_blank_in_name, ]

#removing row numbers from the dataset where both attention checks were failed (the process that identified these row numbers can be viewed in 'data_quality_check.Rmd')

data <- data %>%
  filter(!(row_number() %in% c(420, 948, 1288, 1315)))

#removing exclusions based on half the median time, 360 seconds

data <- data %>%
  filter(Duration..in.seconds. >= 360)

#this should leave a data-frame with 1322 observations
#extract relevant variables into new data frame

data <- data %>%
  select(id, Training.condition, Advert.1, Advert.2, Advert.3, Advert.4, starts_with("PK"), starts_with("agree"), starts_with("informed"), starts_with("accurate"), starts_with("believable"), starts_with("trustworthy"), starts_with("factual"), election_reg, recall_num, recall_name, starts_with("useful"), reg_know, starts_with("EPE"), starts_with("general_confidence"), starts_with("institution"), democracy, political_interest, external_efficacy, internal_efficacy, starts_with("SM"), partyID, age_sample, gender, education, Ethnicity.simplified)

Agree/disagree item transformations

The code below will convert all variables with response measurement of strongly disagree to strongly agree from a character variable to a numerical scale of 1-7. One item also needs to be reverse scored:

  • Informed item 3: ‘I am not sure who is behind this material’

There are also attention checks in the dataset that need to be removed once exclusions have been dealt with:

  • informed_2_5
  • informed_2imprint_5
  • EPE_5

Below creates a functions that will be applied to all agree-disagree response formats in the dataset - all ones that start with PK, agree, informed, EPE and general_confidence. The second function then reverse scores informed item three across the eight advert variations.

#converting to numeric variables from character for agree - disagree
#Persuasion knowledge measures accidentally has a slightly different response option compared to other measures, meaning 2 conversion functions are needed. Instead of 'somewhat' they got 'slightly'.

convert_numeric1 <- function(response) {
  
  # Trim leading and trailing whitespace and convert to lowercase
  response_cleaned <- tolower(trimws(response))
  
  # Define the mapping with all lowercase keys
  mapping <- c(
    "strongly disagree" = 1,
    "disagree" = 2,
    "slightly disagree" = 3,
    "neither agree nor disagree" = 4,
    "slightly agree" = 5,
    "agree" = 6,
    "strongly agree" = 7
  )
  
  # Return the mapped value, or NA if the response does not match
  return(ifelse(!is.na(mapping[response_cleaned]), mapping[response_cleaned], NA))
}

convert_numeric2 <- function(response) {
  
  # Trim leading and trailing whitespace and convert to lowercase
  response_cleaned <- tolower(trimws(response))
  
  # Define the mapping with all lowercase keys
  mapping <- c(
    "strongly disagree" = 1,
    "disagree" = 2,
    "somewhat disagree" = 3,
    "neither agree nor disagree" = 4,
    "somewhat agree" = 5,
    "agree" = 6,
    "strongly agree" = 7
  )
  
  # Return the mapped value, or NA if the response does not match
  return(ifelse(!is.na(mapping[response_cleaned]), mapping[response_cleaned], NA))
}

#applying this function to the data frame (two separate functions to account for differences in response options)
         
data <- data %>%
  mutate(across(starts_with("PK"), convert_numeric1))

data <- data %>%
  mutate(across(c(starts_with("informed"), starts_with("agree"), starts_with("EPE"), starts_with("general")), ~convert_numeric2(.x)))

#reverse scoring informed item 3

reverse_code <- function(response) {
  # Define the mapping from original to reversed scores
  mapping <- c(1, 2, 3, 4, 5, 6, 7)
  names(mapping) <- c(7, 6, 5, 4, 3, 2, 1)
  
  # Use the response as a name to look up in the mapping
  return(as.numeric(names(mapping)[match(response, mapping)]))
}

data <- data %>%
  mutate(across(c(informed_1_3, informed_1imprint_3, informed_2_3, informed_2imprint_3, informed_3_3, informed_3imprint_3, informed_4_3, informed_4imprint_3), ~reverse_code(.x)))

#removing the attention check columns from the dataset

data <- data %>%
  select(-informed_2_5, -informed_2imprint_5, -EPE_5)

Variable tranformations for both RM and IM dataframes

The code below conducts the following transformations to the variables that will be present in both the repeated measures and independent measures data frames so they are ready to be analysed:

  • Transformed to a factor: advert.1, advert.2, advert.3, advert.4, Training.condition, reg_know, SM_use, starts with: SM_frequency, party_ID, gender, education

  • Transformed to a numerical variable: election_reg, starts with: useful_rank, starts with: institution_trust, democracy, political_interest, external_efficacy, internal_efficacy, age

Some variables will only be present in the repeated measures data frame and will be created later.

#creating factor variables through use of a function

convert_to_factor <- function(df, cols) {
  df %>%
    mutate(across(all_of(cols), as.factor))
}

data <- data %>%
  convert_to_factor(c("Advert.1", "Advert.2", "Advert.3", "Advert.4", "SM_frequency_1", "SM_use", "Training.condition", "reg_know", "SM_use", "partyID", "gender", "education", "Ethnicity.simplified"))

#Setting reference groups for: reg_know, SM_use, SM_frequency, gender, education, ethnicity

#regulation knowledge

reg_response_order <- c("There are no regulatory controls on any type of political advertising during UK elections", "All political advertising is regulated by rules set by the UK government, but there is one set of rules for advertising on television and radio and a different set of rules for advertising on the internet and social media", "All political advertising (whether on television, radio, in newspapers or the internet) is subject to the same rules set by the UK government", "Not sure")

data <- data %>%
  mutate(across(reg_know, ~factor(.x, levels = reg_response_order)))

#Social media use
  
use_response_order <- c("None, No time at all ", "Less than 1/2 hour ", "1/2 hour to 1 hour ", "1 to 2 hours ",  "Not sure")

data <- data %>%
  mutate(across(SM_use, ~factor(.x, levels = use_response_order)))
  
#SM frequency use
  
freq_response_order <- c("Never",
                         "Less than once a week",
                         "Once a week\t",
                         "Once every couple of days\t",
                         "Once a day\t",
                         "2-5 times a day",
                         "More than five times a day\t")

data <- data %>%
  mutate(across(SM_frequency_1, ~factor(.x, levels = freq_response_order)))

#gender, female as reference

gender_response_order <- c("Female", "Male", "Non-binary / third gender", "Prefer not to say")

data <- data %>%
  mutate(across(gender, ~factor(.x, levels = gender_response_order)))

#Education level, postgrad as reference

ed_response_order <- c("Postgraduate (e.g. M.Sc, Ph.D)", "Undergraduate University (e.g. BA, B.Sc, B.Ed)", "A-level, or equivalent", "GCSE level, or equivalent", "Other, please specify", "No formal qualifications")

data <- data %>%
  mutate(across(education, ~factor(.x, levels = ed_response_order)))

#Ethnicity level, white as reference

ethn_response_order <- c("White", "Asian", "Black", "Mixed", "Other")

data <- data %>%
  mutate(across(Ethnicity.simplified, ~factor(.x, levels = ethn_response_order)))
#Need to first change response options from categories to numbers for: election_reg, institution_trust, democracy, political_interest, internal_efficacy, external_efficacy, age

#Confidence in electoral regulation

data <- data %>%
  mutate(election_reg = case_when(
    election_reg == "Completely insufficient" ~ 1,
    election_reg == "Mostly insufficient" ~ 2,
    election_reg == "Slightly insufficient" ~ 3,
    election_reg == "No opinion/not sure" ~ 4,
    election_reg == "Slightly sufficient" ~ 5,
    election_reg == "Mostly sufficient" ~ 6,
    election_reg == "Completely sufficient" ~ 7
  ))

#Converting 'democracy' to a numeric variable

data <- data %>%
  mutate(democracy = case_when(
    democracy == "Very dissatisfied" ~ 1,
    democracy == "A little dissatisfied" ~ 2,
    democracy == "Fairly satisfied" ~ 3,
    democracy == "Very satisfied" ~ 4
  ))

#converting political interest to a numerical variable

data <- data %>%
  mutate(political_interest = case_when(
    political_interest == "Not at all interested" ~ 1,
    political_interest == "Not very interested" ~ 2,
    political_interest == "Slightly interested" ~ 3,
    political_interest == "Fairly interested" ~ 4,
    political_interest == "Very interested " ~ 5
  ))

#converting internal and external efficacy to numeric, 5 options

data <- data %>%
  mutate(internal_efficacy = case_when(
    internal_efficacy == "Not at all " ~ 1,
    internal_efficacy == "A little " ~ 2,
    internal_efficacy == "A moderate amount  " ~ 3,
    internal_efficacy == "A lot " ~ 4,
    internal_efficacy == "A great deal " ~ 5
  ))

data <- data %>%
  mutate(external_efficacy = case_when(
    external_efficacy == "Not at all " ~ 1,
    external_efficacy == "A little " ~ 2,
    external_efficacy == "A moderate amount  " ~ 3,
    external_efficacy == "A lot " ~ 4,
    external_efficacy == "A great deal " ~ 5
  ))

#creating numeric variables through the use of a function

convert_to_numeric <- function(df, cols) {
  df %>%
    mutate(across(all_of(cols), as.numeric))
}

#age

data$age_sample <- as.numeric(data$age_sample)

#Convert all other variables to numeric

data <- data %>%
  convert_to_numeric(c("useful_rank_1", "useful_rank_2", "useful_rank_3", "useful_rank_4", "useful_rank_5", "useful_rank_6"))

Recall variable transformations

Transformation of recall variables:

  • Recall_num: two new columns need to be created specifying those who picked ‘not sure’ versus those who chose an answer, then those who were correct, chose 2, and those who were incorrect.

  • Recall_name: 8 potential columns will need to be created with a binary response, indicating whether each name option was identified e.g. ‘common sense collective’.

  • The correct identification options are:

    • Common sense collective - advert 1
    • Breaking barriers alliance - advert 2
    • Speak freely Inc.- advert 3
    • Campaign for a better Britain - advert 4
  • Incorrect options

    • Future first
    • The peoples movement
    • Voice for the people
    • Hope something - removed from qualtrics and replaced with ad 4
    • All together
#Recall number transformation for correct/incorrect response

data <- data %>%
  mutate(recall_correct = 
           case_when(
             recall_num == 2 ~ "correct",
             TRUE ~ "incorrect"
           ))

#Recall name transformation, correct responses

data <- data %>%
  mutate(CSC = case_when(
    str_detect(recall_name, "Common Sense Collective") ~ 1,
    TRUE ~ 0
  ))

data <- data %>%
  mutate(BBA = case_when(
    str_detect(recall_name, "Breaking Barriers Alliance") ~ 1,
    TRUE ~ 0
  ))

data <- data %>%
  mutate(SFI = case_when(
    str_detect(recall_name, "Speak Freely Inc") ~ 1,
    TRUE ~ 0
  ))

data <- data %>%
  mutate(CBB = case_when(
    str_detect(recall_name, "Campaign for a better Britain") ~ 1,
    TRUE ~ 0
  ))

#incorrect responses

data <- data %>%
  mutate(FF = case_when(
    str_detect(recall_name, "Future First") ~ 1,
    TRUE ~ 0
  ))

data <- data %>%
  mutate(TPM = case_when(
    str_detect(recall_name, "The People’s movement") ~ 1,
    TRUE ~ 0
  ))

data <- data %>%
  mutate(VFP = case_when(
    str_detect(recall_name, "Voice for the People") ~ 1,
    TRUE ~ 0
  ))

data <- data %>%
  mutate(AT = case_when(
    str_detect(recall_name, "All Together") ~ 1,
    TRUE ~ 0
  ))

#number of correct names recalled, name_correct

data <- data %>%
  mutate(name_correct = CSC + BBA + SFI + CBB)

#number of incorrect names recalled, name_incorrect

#add incorrect columns together

data <- data %>%
  mutate(name_incorrect = FF + TPM + VFP + AT)

#convert campaign names to factors

data <- data %>%
  convert_to_factor(c("recall_correct", "CSC", "BBA", "SFI", "CBB", "FF", "TPM", "VFP", "AT"))

Repeated measures dataframe

The code below turns the wide data into long data, creating 4 rows for each participant and only one column for each of the outcome variables: persuasion knowledge, political goal, informedness, agreement, believability, trustworthiness, accurateness, factual. Extra columns also specify the advert viewed and the version (imprint or no imprint).

#create a new dataframe with only the repeated measures (post-advert) variables

RM <- data %>%
  select(id, starts_with("Advert."), starts_with("PK"), starts_with("agree"), starts_with("informed"), starts_with("accurate"), starts_with("believable"), starts_with("trustworthy"), starts_with("factual"))

#when first converted into long data, eight rows are generated for each participant for the eight different advert variations, but many columns contain NA.

#persuasion knowledge df, each item separate

PK1_long <- RM %>%
  select(id, starts_with("Advert."), PK_1_1, PK_1imprint_1, PK_2_1, PK_2imprint_1, PK_3_1, PK_3imprint_1, PK_4_1, PK_4imprint_1) %>%
  pivot_longer(
    cols = c(PK_1_1, PK_1imprint_1, PK_2_1, PK_2imprint_1, PK_3_1, PK_3imprint_1, PK_4_1, PK_4imprint_1),
    names_to = "PK1",
    values_to = "PK1_value"
  )

PK2_long <- RM %>%
  select(id, starts_with("Advert."), PK_1_2, PK_1imprint_2, PK_2_2, PK_2imprint_2, PK_3_2, PK_3imprint_2, PK_4_2, PK_4imprint_2) %>%
  pivot_longer(
    cols = c(PK_1_2, PK_1imprint_2, PK_2_2, PK_2imprint_2, PK_3_2, PK_3imprint_2, PK_4_2, PK_4imprint_2),
    names_to = "PK2",
    values_to = "PK2_value"
  )

PK3_long <- RM %>%
  select(id, starts_with("Advert."), PK_1_3, PK_1imprint_3, PK_2_3, PK_2imprint_3, PK_3_3, PK_3imprint_3, PK_4_3, PK_4imprint_3) %>%
  pivot_longer(
    cols = c(PK_1_3, PK_1imprint_3, PK_2_3, PK_2imprint_3, PK_3_3, PK_3imprint_3, PK_4_3, PK_4imprint_3),
    names_to = "PK3",
    values_to = "PK3_value"
  )

PK4_long <- RM %>%
  select(id, starts_with("Advert."), PK_1_4, PK_1imprint_4, PK_2_4, PK_2imprint_4, PK_3_4, PK_3imprint_4, PK_4_4, PK_4imprint_4) %>%
  pivot_longer(
    cols = c(PK_1_4, PK_1imprint_4, PK_2_4, PK_2imprint_4, PK_3_4, PK_3imprint_4, PK_4_4, PK_4imprint_4),
    names_to = "PK4",
    values_to = "PK4_value"
  )


#political goal df, informed item 1

PG_long <- RM %>%
  select(id, starts_with("Advert."), informed_1_1, informed_1imprint_1, informed_2_1, informed_2imprint_1, informed_3_1, informed_3imprint_1, informed_4_1, informed_4imprint_1) %>%
  pivot_longer(
    cols = c(informed_1_1, informed_1imprint_1, informed_2_1, informed_2imprint_1, informed_3_1, informed_3imprint_1, informed_4_1, informed_4imprint_1),
    names_to = "political_goal",
    values_to = "PG_value"
  )

#informed df, each item separate

informed2_long <- RM %>%
  select(id, starts_with("Advert."), informed_1_2, informed_1imprint_2, informed_2_2, informed_2imprint_2, informed_3_2, informed_3imprint_2, informed_4_2, informed_4imprint_2) %>%
  pivot_longer(
    cols = c(informed_1_2, informed_1imprint_2, informed_2_2, informed_2imprint_2, informed_3_2, informed_3imprint_2, informed_4_2, informed_4imprint_2),
    names_to = "informed2",
    values_to = "informed2_value"
  )

informed3_long <- RM %>%
  select(id, starts_with("Advert."), informed_1_3, informed_1imprint_3, informed_2_3, informed_2imprint_3, informed_3_3, informed_3imprint_3, informed_4_3, informed_4imprint_3) %>%
  pivot_longer(
    cols = c(informed_1_3, informed_1imprint_3, informed_2_3, informed_2imprint_3, informed_3_3, informed_3imprint_3, informed_4_3, informed_4imprint_3),
    names_to = "informed3",
    values_to = "informed3_value"
  )

informed4_long <- RM %>%
  select(id, starts_with("Advert."), informed_1_4, informed_1imprint_4, informed_2_4, informed_2imprint_4, informed_3_4, informed_3imprint_4, informed_4_4, informed_4imprint_4) %>%
  pivot_longer(
    cols = c(informed_1_4, informed_1imprint_4, informed_2_4, informed_2imprint_4, informed_3_4, informed_3imprint_4, informed_4_4, informed_4imprint_4),
    names_to = "informed4",
    values_to = "informed4_value"
  )

#agreement df

agree_long <- RM %>%
  select(id, starts_with("Advert."), starts_with("agree")) %>%
  pivot_longer(
    cols = starts_with("agree"),
    names_to = "agree",
    values_to = "agree_value"
  )

#trustworthy df

trustworthy_long <- RM %>%
  select(id, starts_with("Advert."), starts_with("trustworthy")) %>%
  pivot_longer(
    cols = starts_with("trustworthy"),
    names_to = "trustworthy",
    values_to = "trustworthy_value"
  )

#believability df

believe_long <- RM %>%
  select(id, starts_with("Advert."), starts_with("believable")) %>%
  pivot_longer(
    cols = starts_with("believable"),
    names_to = "believable",
    values_to = "believable_value"
  )

#accurateness df

accurate_long <- RM %>%
  select(id, starts_with("Advert."), starts_with("accurate")) %>%
  pivot_longer(
    cols = starts_with("accurate"),
    names_to = "accurate",
    values_to = "accurate_value"
  )

#factual df

factual_long <- RM %>%
  select(id, starts_with("Advert."), starts_with("factual")) %>%
  pivot_longer(
    cols = starts_with("factual"),
    names_to = "factual",
    values_to = "factual_value"
  )

#Create two new variables in each indicating advert type and version viewed, so that the dataframes can be merged by these two columns

#Below is three functions that can be applied to each df to create new variables.

# Function to add 'advert' and 'version' based on patterns in a specified column
add_advert_version <- function(data, column_name) {
  data %>%
    mutate(
      advert = case_when(
        str_detect(!!sym(column_name), "1") ~ "advert.1",
        str_detect(!!sym(column_name), "2") ~ "advert.2",
        str_detect(!!sym(column_name), "3") ~ "advert.3",
        str_detect(!!sym(column_name), "4") ~ "advert.4",
        TRUE ~ NA_character_
      ),
      version = case_when(
        str_detect(!!sym(column_name), "imprint") ~ 1,
        TRUE ~ 0
      )
    ) 
}

#apply function for agree, trust, believe, factual, accurate

agree_long <- add_advert_version(agree_long, "agree")
trustworthy_long <- add_advert_version(trustworthy_long, "trustworthy")
believe_long <- add_advert_version(believe_long, "believable")
accurate_long <- add_advert_version(accurate_long, "accurate")
factual_long <- add_advert_version(factual_long, "factual")

#PK function

PK_advert_version <- function(data, column_name) {
  data %>%
    mutate(
      advert = case_when(
        str_detect(!!sym(column_name), "PK_1") ~ "advert.1",
        str_detect(!!sym(column_name), "PK_2") ~ "advert.2",
        str_detect(!!sym(column_name), "PK_3") ~ "advert.3",
        str_detect(!!sym(column_name), "PK_4") ~ "advert.4",
        TRUE ~ NA_character_
      ),
      version = case_when(
        str_detect(!!sym(column_name), "imprint") ~ 1,
        TRUE ~ 0
      )
    ) 
}

PK1_long <- PK_advert_version(PK1_long, "PK1")
PK2_long <- PK_advert_version(PK2_long, "PK2")
PK3_long <- PK_advert_version(PK3_long, "PK3")
PK4_long <- PK_advert_version(PK4_long, "PK4")

#informed function

in_advert_version <- function(data, column_name) {
  data %>%
    mutate(
      advert = case_when(
        str_detect(!!sym(column_name), "informed_1") ~ "advert.1",
        str_detect(!!sym(column_name), "informed_2") ~ "advert.2",
        str_detect(!!sym(column_name), "informed_3") ~ "advert.3",
        str_detect(!!sym(column_name), "informed_4") ~ "advert.4",
        TRUE ~ NA_character_
      ),
      version = case_when(
        str_detect(!!sym(column_name), "imprint") ~ 1,
        TRUE ~ 0
      )
    ) 
}

PG_long <- in_advert_version(PG_long, "political_goal")
informed2_long <- in_advert_version(informed2_long, "informed2")
informed3_long <- in_advert_version(informed3_long, "informed3")
informed4_long <- in_advert_version(informed4_long, "informed4")

#the code below creates a function that filters out redundant rows, leaving 4 for each participant

clean_NA <- function(df) {
  # Identify the column(s) ending with '_value'
  value_cols <- names(df)[grepl("_value$", names(df))]
  
  # Ensure there is at least one column ending with '_value'
  if (length(value_cols) > 0) {
    df <- df %>%
      filter(!is.na(.[[value_cols]])) %>%
      distinct(id, advert, .keep_all = TRUE)
  }
  
  return(df)
}

#apply this function to all dataframes, specified through thier shared name of '_long' at the end of each df

df_names <- ls(pattern = "_long$")
df_list <- mget(df_names, envir = .GlobalEnv)

for (name in names(df_list)) {
  assign(name, clean_NA(get(name)), envir = .GlobalEnv)
}

#merge the dataframes back together by matching advert, participant id and version

rm_list <- list(PK1_long, PK2_long, PK3_long, PK4_long, PG_long, informed2_long, informed3_long, informed4_long, agree_long, trustworthy_long, accurate_long, believe_long, factual_long)

merged_rm <- reduce(rm_list, full_join, by = c("id", "advert", "version", "Advert.1", "Advert.2", "Advert.3", "Advert.4"))

#changing order of columns

merged_rm <- merged_rm %>%
  select(id, Advert.1, Advert.2, Advert.3, Advert.4, advert, version, everything())

#delete the variable columns e.g., 'PK1', 'informed2'

repeated_measures <- merged_rm %>%
  select(-c(PK1, PK2, PK3, PK4, political_goal, informed2, informed3, informed4, agree, trustworthy, believable, accurate, factual))

The code chunk below mean scores the persuasion knowledge items and the informed items. These are not the only scales that will be mean scored, but they are the only mean-scored items in the repeated measures part of the experiment (post-advert questions). Mean scoring of EPE and political trust items occur in a later section.

repeated_measures <- repeated_measures %>%
  rowwise() %>%
  mutate(PK = mean(c(PK1_value, PK2_value, PK3_value, PK4_value)))

repeated_measures <- repeated_measures %>%
  rowwise() %>%
  mutate(informed = mean(c(informed2_value, informed3_value, informed4_value)))

#changing the order of columns

repeated_measures <- repeated_measures %>%
  select(id, Advert.1, Advert.2, Advert.3, Advert.4, advert, version, PK, informed, PG_value, agree_value, trustworthy_value, believable_value, accurate_value, factual_value, everything())

Merged repeated measures data frame

The code below will now merge relevant variables from outside the repeated measures part of the experiment with this dataframe e.g., training condition, demographic variables and recall measures.

Variable descriptions for those with unclear names: - useful_rank_1 = where ‘voters’ were ranked by participants - SM_frequency_1 = how often participants use Facebook

#creating a new df with relevant variables e.g., controls for models

control_measures <- data %>%
  select(id, Training.condition, recall_num, recall_name, recall_correct, CSC, BBA, SFI, CBB, FF, TPM, VFP, AT, reg_know, useful_rank_1, political_interest, SM_use, SM_frequency_1, partyID, age_sample, gender, education, Ethnicity.simplified)

#matching id number with the repeated measures dataframe so these variables are repeated across rows

imprint_df <- repeated_measures %>%
  left_join(control_measures, by = "id")

#changing the order of columns

imprint_df <- imprint_df %>%
  select(id, Advert.1, Advert.2, Advert.3, Advert.4, Training.condition, advert, version, PK, informed, PG_value, agree_value, trustworthy_value, believable_value, accurate_value, factual_value, recall_num, recall_correct, CSC, BBA, SFI, CBB, FF, TPM, VFP, AT, political_interest, reg_know, SM_use, SM_frequency_1, partyID, age_sample, gender, education, Ethnicity.simplified, everything())

The code below conducts the following transformations to the variables so they are ready to be analysed:

  • Transformed to a factor: version, advert
  • Transformed to a numerical variable: PG_value, agree_value, trustworthy_value, believe_value, accurate_value, factual_value
#functions created in earlier section

imprint_df <- imprint_df %>%
  convert_to_factor(c("version", "advert"))

imprint_df <- imprint_df %>%
  convert_to_numeric(c("PG_value", "agree_value", "trustworthy_value", "believable_value", "accurate_value", "factual_value"))

Independent measures data frame

Another aspect of the analysis will only require one row per participant, such as when testing the effect of the training condition on various outcomes e.g., confidence in regulation or epistemic political efficacy.

training_df <- data %>%
  select(id, Training.condition, Advert.1, Advert.2, Advert.3, Advert.4, election_reg, recall_num, recall_correct, name_correct, name_incorrect, CSC, BBA, SFI, CBB, FF, TPM, VFP, AT, starts_with("useful_rank"), reg_know, starts_with("EPE"), starts_with("general_confidence"), starts_with("institution_trust"), democracy, political_interest, external_efficacy, internal_efficacy, SM_use, starts_with("SM_frequency"), partyID, age_sample, gender, education, Ethnicity.simplified)

#Mean scoring EPE

training_df <- training_df %>%
  rowwise() %>%
  mutate(EPE_mean = mean(c(EPE_1, EPE_2, EPE_3, EPE_4)))

#Mean scoring trust, mistrust and cynicism

training_df <- training_df %>%
  rowwise() %>%
  mutate(political_trust = mean(c(general_confidence_1, general_confidence_2, general_confidence_3)))

training_df <- training_df %>%
  rowwise() %>%
  mutate(political_mistrust = mean(c(general_confidence_4, general_confidence_5, general_confidence_6)))

training_df <- training_df %>%
  rowwise() %>%
  mutate(political_cynicism = mean(c(general_confidence_7, general_confidence_8, general_confidence_9)))

Cleaning up the R environment

rm(list=setdiff(ls(), c("data", "imprint_df", "training_df")))

This document relies on the correct data frames having been formed from the code above which can be viewed under ‘details’. This information is also stored in a separate R Markdown document ‘datawrangling_code.Rmd’. The correct data frames used in the following analyses are called: imprint_df and training_df. Almost all hypotheses are tested using the former dataframe, which includes four rows for each participant to capture the repeated measures part of the experiment. Some hypotheses are tested using the latter dataframe, which includes only one row per participant.

Pre-registered research questions

Research theme 1: the effect of viewing a digital imprint on subsequent evaluations

  • Research question 1: Does the presence of a digital imprint increase citizens knowledge about the source of digital campaign material?

Research theme 2: the effect of being informed about the purpose of digital imprints on subsequent evaluations

  • Research question 2: How does being informed about the purpose of digital imprints affect citizens’ knowledge of the source of campaign material?
  • Research question 3: How does being informed about the purpose of digital imprints affect citizens’ perceptions of the trustworthiness of such material?
  • Research question 4: How does being informed about the purpose of a digital imprint affect citizen views on the sufficiency of current regulatory oversight?

Pre-registered hypotheses

Research question 1:

  • H1a: Digital imprints will increase respondents knowledge about the source of a piece of digital campaign material, with regards to the campaigners’ political and persuasive intent.
  • H1b: The presence of a digital imprint will not increase respondent’s memory of the names of campaigners whose post they viewed.
  • H1c: The presence of a digital imprint will not increase respondent’s perception that they are more informed about the source of campaign material.

Research question 2:

  • H2a: Those who are informed about the purpose of digital imprints will be more likely to correctly recall the names of the campaigners.
  • H2b: Those who are informed about the purpose of digital imprints will perceive themselves as informed about the source of a piece of material if and only if a digital imprint is present.

Research question 3:

  • H3: Those who are informed about the purpose of digital imprints will perceive campaign content as more trustworthy if and only if a digital imprint is present with the content.

Research question 4:

  • H4: Those who are informed about the purpose of an imprint are more likely to perceive campaign laws as sufficient compared to those who are not informed about the purpose.

Important note: pre-registration deviations in final paper

In the final analysis, some changes were made to the order of the hypotheses (please note that in content these stayed the same), and the analysis script that was pre-registered (the original can be viewed in the ‘rawdata_with_wranglecode’ folder of the github repository).

These changes did not reflect any substantial altering of the theorised associations between our variables, as reflected in how all hypotheses stayed the same. However, we altered the order of presentation for our hypotheses to ensure the most efficient reporting of the outcomes. This document includes the final version of all statistical models reported in the main paper.

If you would like to run the original pre-registered analysis script for comparison please follow these steps:

  • Locate the ‘rawdata_with_wranglecode’ folder in the repo
  • Ensure you have ‘main_data.csv’ saved in the same R project
  • Download the ‘preregistered_analysis_code.rmd’ script
  • Check you have packages installed (there are two sets, one for creating the data structure and another for the analysis packages)
  • Knit the script for ‘preregistered_analysis_code.rmd’

This should create an html document that can be scrolled through to view the original models, including all the assumptions for them (under the ‘details’ tabs)

R packages: visualisation and analysis

library(lme4)
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
library(lmerTest)
## 
## Attaching package: 'lmerTest'
## The following object is masked from 'package:lme4':
## 
##     lmer
## The following object is masked from 'package:stats':
## 
##     step
library(Matrix)
library(sjPlot)
## Install package "strengejacke" from GitHub (`devtools::install_github("strengejacke/strengejacke")`) to load all sj-packages at once!
library(ggplot2)
library(ggeffects)
library(performance)
library(see)
library(patchwork)
library(knitr)
library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
library(broom)
library(broom.mixed)
library(htmltools)
library(rlang)
## 
## Attaching package: 'rlang'
## The following objects are masked from 'package:purrr':
## 
##     %@%, flatten, flatten_chr, flatten_dbl, flatten_int, flatten_lgl,
##     flatten_raw, invoke, splice
library(psych)
## 
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
library(lattice)
library(afex)
## ************
## Welcome to afex. For support visit: http://afex.singmann.science/
## - Functions for ANOVAs: aov_car(), aov_ez(), and aov_4()
## - Methods for calculating p-values with mixed(): 'S', 'KR', 'LRT', and 'PB'
## - 'afex_aov' and 'mixed' objects can be passed to emmeans() for follow-up tests
## - Get and set global package options with: afex_options()
## - Set sum-to-zero contrasts globally: set_sum_contrasts()
## - For example analyses see: browseVignettes("afex")
## ************
## 
## Attaching package: 'afex'
## The following object is masked from 'package:lme4':
## 
##     lmer
library(stats)

Above are the R packages used for analysis and visualisation. Please ensure you have these installed if you wish to knit the script. The package ‘MASS’, ‘Lavaan’ and ‘Laavanplot’ are used in the final stage of the script. They is loaded later to avoid package conflicts with tidyverse/dplyr.

Pre-analysis Checks

Scale reliability

Below shows the cronbach alpha scores for the mean-averaged scale items in the dataset:

  • Persuasion knowledge
  • Percieved informedness
Cronbach’s Alpha for Scales
Scale Alpha
Persuasion knowledge 0.69
Perceived informedness 0.88

Sample

A representative sample option was used when collecting data from Prolific, matching UK census data for age, gender and ethnicity. The following table shows the make up of the sample for these demographics as well as education and political party identification, after exclusions.

Sample Breakdown
Sample Breakdown

Univariate statistics

Below provides descriptive statistics for the key predictor and outcome variables. The first table lists the continuous measures in the repeated measures part of the experiment, and the second shows confidence in regulation, which was only measured once. As can be seen, perceived political goal was heavily skewed. This suggested the political nature of each post was easy for participants to infer.

Summary Statistics for repeated measures variables
Variable Mean Median SD Min Max Q1 Q3
Political goal 6.03 6.00 1.13 1 7 6 7.00
Persuasion knowledge 4.85 4.75 1.07 1 7 4 5.75
Perceived informedness 4.21 4.33 1.57 1 7 3 5.67
Agreement 4.54 5.00 1.52 1 7 4 6.00
Trustworthiness 3.68 4.00 1.49 1 7 3 5.00
Believability 4.50 5.00 1.55 1 7 4 6.00
Factualness 3.26 3.00 1.80 1 7 2 5.00
Accuracy 4.21 4.00 1.46 1 7 3 5.00
Summary Statistics for independent measure variables
Variable Mean Median SD Min Max Q1 Q3
Confidence in regulation 2.95 3 1.43 1 7 2 4

Below shows this information split by the two conditions: training and version viewed.

Univariate Histograms

Below then shows the distribution of each variable as a histogram.

PG Value Histogram
PK Histogram
Informed Histogram
Agree Value Histogram
Trustworthy Value Histogram
Believable Value Histogram
Factual Value Histogram
Accurate Value Histogram
Election Regulation Histogram

Bivariate Statistics

Below shows the percentage recall for each of the campaigner names. As can be seen, there is variation between the names, with ‘Speak Freely Inc’ resulting in the lowest recall, and ‘Campaign for a Better Britain’ the highest. This suggests some names were either more memorable than others, or were more visually obvious on the page. This variation allows us to investigate if digital imprints consistently improve recall regardless of these overall differences in recall between names. This helps uncover the effectiveness of digital imprints across different formats in increasing citizen awareness of which campaigners are potentially targeting them during an election. Assessing each advert separately, which will be included as a supplementary analysis to hypothesis 2a, helps decipher how obvious digital imprints need to be to be effective in increasing recall.

Percentage of Participants Who Recalled Each Name
Name Recall % No recall %
Common Sense Collective 63.09 36.91
Breaking Barriers Alliance 46.22 53.78
Speak Freely Inc 37.67 62.33
Campaign for a Better Britain 71.41 28.59
Total 54.60 45.40

Knowledge of Regulation

Below shows how regulation knowledge was distributed across the whole sample, with the third option being the correct answer. This measure is not included in the pre-registered analysis, but is helpful to get a sense of how knowledgeable the sample was regarding regulatory law in the UK. Training and no training condition are shown separately. It can be seen, the training did not appear to impact the frequency of responses, with the highest count of participant in each condition identifying the correct response option across both conditions.

Usefullness Rankings

Below shows where in the ranking participants tended to rate ‘voters’ when asked who the digital imprint information was most useful for.

## `summarise()` has grouped output by 'Training.condition'. You can override
## using the `.groups` argument.

## `summarise()` has grouped output by 'Training.condition'. You can override
## using the `.groups` argument.

Hypothesis 1a

Outcome: political goal

Did the presence of a digital imprint with a piece of campaign material increase participant awareness that the material had a political goal?

Plot: raw data

Below shows the skewed nature of the outcome variable, as well as the association separately for each advert.

## `geom_smooth()` using formula = 'y ~ x'

Pre-registration deviations

Pre-registered model:

  • model 1 <- lmer(political goal ~ imprint viewed + (1|id) + (1|advert))

Reported model:

  • model 1 <- lmer(political goal ~ version + training + (1|participant id) + (1|material))

Main deviation:

  • The effect of training was included as, although not central to the hypotheses, it was recognised the training may have also influenced perceptions of the outcome that needed to be accounted for to ensure the effect of version was not overestimated

Why random effects at all?

Random effects are essential to include because it is expected individual participants will evaluate the adverts from different baselines. Fitting a model that accounts for this variance in baseline assessments among participants recognises the individual differences in perceptions that occur in response patterns.

Random effects are also necessary for the adverts themselves, to acknowledge that each advert will also elicit different perceptions due to their content or nature. Capturing this variability reflects the reality that some adverts, by their design are more, for example, politically charged or persuasive than others, thereby starting from different evaluative baselines.

Model fit metrics: AIC

Why was the reported model chosen out of theoretically feasible variants?

It is recognised a variety of random effect structures would be theoretically feasible. To choose the most appropriate model out of the alternatives, model fit criterion are used to assess if adding model complexity sacrifices valid estimation through reducing statistical power (Matuschek et al. 2017).

Theoretically feasible model structures:

  • Random intercepts: Outcome ~ 1 + version + training + (1|participant id) + (1|material)
  • Maximal random slopes model: Outcome ~ 1 + version + training + (1|participant id) + (1 + version|material)
  • Material:version intercepts model: Outcome ~ 1 + version + training + (1|participant id) + (1|material:version)
  • Fixed-effect interaction: Outcome ~ 1 + version*training + (1|participant id) + (1|material)

In this case, the final chosen model included random intercepts for participant ID and advert. As can be seen below in the model variation comparisons, including random slopes worsened the fit of the model when evaluated using AIC values. This strongly implies adding model complexity in the form of random slopes does not aid a valid estimation of effects in this case, likely due to the small effects of digital imprint version.

The ‘fit is singular’ warning message below means the model failed to converge, and in this case relates to model structure 2.

## boundary (singular) fit: see help('isSingular')
AIC Values and Variables: Political Goal
Variations Variables AIC
Structure 1 version + training + id (random intercept) + advert (random intercept) 14977.57
Structure 2 version + training + id (random intercept) + (version & advert (random slope)) 14974.59
Structure 3 version + training + id (random intercept) + (advert:version (random intercepts vary)) 14986.75
Structure 4 version*training + id (random intercept) + advert (random intercept) 14982.11

Model Outcomes: Table

Outcome: political goal
Term Coefficient Std. Error Lower CI Upper CI p-value
(Intercept) 6.014 0.251 5.226 6.802 0.000
Digital imprint viewed (ref: not viewed) |0.045 |0.024 |-0.002 |0.093 |0.063
Training (ref: no training) |-0.011 |0.039 |-0.087 |0.065 |0.780
Random Effect (id) 0.298
Random Effect (advert) 0.248
ICC 0.411
σ² 0.885
Marginal R² 0.000
Conditional R² 0.411

Plot: model predictions

Model assumptions

The assumptions checked below are as follows:

  • normality of residuals
  • variance of residuals
  • normality of residuals for each of the random effects (id and advert)

As can be seen, as the political goal variable is so heavily skewed, the assumptions for this model are not met. This is likely because the political nature of the posts could easy be inferred from the context of the post, not requiring a digital imprint to alter participants to this. Even though the validity of the predictions are called into question, it can be likely still be concluded that the political nature of the post was easy for participants to infer, making it unlikely a digital imprint would alter this perception in this context.

## 
##  Shapiro-Wilk normality test
## 
## data:  id_effects
## W = 0.93935, p-value < 2.2e-16
## 
##  Shapiro-Wilk normality test
## 
## data:  advert_effects
## W = 0.89179, p-value = 0.3914

Outcome: Persuasion Knowledge

Did the presence of a digital imprint with a piece of campaign material increase participant awareness that the material was trying to persuade them of a certain viewpoint?

Plotting raw data

## `geom_smooth()` using formula = 'y ~ x'

Checking for model fit metrics using AIC values

#ainform_pk <- lmer(PK ~ version + (1|id) + (1|advert), data = imprint_df)

#issue with convergence for model below, used apex package to investigate, bobyqa was able to converge

ainform_pk1 <- lmer(PK ~ version + Training.condition + (1 | id) + (1|advert), data = imprint_df, control = lmerControl(optimizer = "bobyqa"))

#random slopes for version and advert

ainform_pk2 <- lmer(PK ~ version + Training.condition + (1| id) + (1 + version|advert), data = imprint_df)

# advert:version intercept vary

ainform_pk3 <- lmer(PK ~ version + Training.condition + (1 | id) + (1|advert:version), data = imprint_df)

# fixed effect interaction version and training

ainform_pk4 <- lmer(PK ~ version*Training.condition + (1 | id) + (1|advert), data = imprint_df)

#visualising the AIC comparisons
aic_pk <- AIC(ainform_pk1)
aic_pk1 <- AIC(ainform_pk2)
aic_pk2 <- AIC(ainform_pk3)
aic_pk3 <- AIC(ainform_pk4)

# Define variable descriptions
variables_pk <- "version + training + id (random intercept) + advert (random intercept)"
variables_pk1 <- "version + training + id (random intercept) + (version & advert (random slope))"
variables_pk2 <- "version + training + id (random intercept) + (advert:version (random intercepts vary))"
variables_pk3 <- "version*training + id (random intercept) + advert (random intercept)"

# Create a data frame including all wanted information
aic_table <- data.frame(
  Variations = c("Model 1", "Model 2", "Model 3", "Model 4"),
  Variables = c(variables_pk, variables_pk1, variables_pk2, variables_pk3),
  AIC = c(aic_pk, aic_pk1, aic_pk2, aic_pk3)
)

#create table
aic_table %>%
  kable(caption = "AIC Values and Variables: Persuasion Knowledge", escape = FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
AIC Values and Variables: Persuasion Knowledge
Variations Variables AIC
Model 1 version + training + id (random intercept) + advert (random intercept) 13985.09
Model 2 version + training + id (random intercept) + (version & advert (random slope)) 13988.71
Model 3 version + training + id (random intercept) + (advert:version (random intercepts vary)) 13997.46
Model 4 version*training + id (random intercept) + advert (random intercept) 13990.99

Model outcomes: table

  • Outcome: PK, numerical 1-7
  • Predictor: version, binary factor
  • Random effects: id and advert
  Persuasion knowledge
Predictors Estimates CI p
Intercept 4.73 4.27 – 5.19 <0.001
Digital Imprint included 0.10 0.06 – 0.14 <0.001
Training Condition 0.15 0.07 – 0.23 <0.001
Random Effects
σ2 0.60
τ00 id 0.37
τ00 advert 0.21
ICC 0.49
N id 1322
N advert 4
Observations 5288
Marginal R2 / Conditional R2 0.007 / 0.497

False Discovery Rate Checks for p-value

## Contrasts set to contr.sum for the following variables: version, Training.condition, advert
## REML argument to lmer() set to FALSE for method = 'PB' or 'LRT'
## Mixed Model Anova Table (Type 3 tests, LRT-method)
## 
## Model: PK ~ version + Training.condition + (1 | id) + (1 | advert)
## Data: imprint_df
## Df full model: 6
##               Effect df     Chisq p.value
## 1            version  1 22.25 ***   <.001
## 2 Training.condition  1 14.17 ***   <.001
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
##         (Intercept)            version1 Training.condition1 
##         2.19524e-04         2.39786e-06         1.68686e-04
##         (Intercept)            version1 Training.condition1 
##        2.195240e-04        7.193579e-06        2.195240e-04

Plotting: Model Predictions

Model Assumptions

The assumptions checked below are as follows:

  • normality of residuals
  • variance of residuals
  • normality of residuals for each of the random effects (id and advert)

Hypothesis 2a and 2b

Outcome: Recall of Names

Did viewing a digital imprint with a piece of campaign material increase participants memory of the campaigner name?

Plotting: raw data

To test this, recall of the campaigner name can be tested with a logistical regression model to see if viewing the imprint boosted recall of the name. This also uses a newly created dataframe: recall_transform.

Correct names:

  • Advert 1: Common sense collective: CSC
  • Advert 2: Breaking barriers alliance: BBA
  • Advert 3: Speak freely inc: SFI
  • Advert 4: Campaign for a better Britain: CBB

Model

  • Outcome: recall, binary factor
  • Predictor: version, binary factor
## Contrasts set to contr.sum for the following variables: recall, version, Training.condition, advert
## Mixed Model Anova Table (Type 3 tests, LRT-method)
## 
## Model: recall ~ version * Training.condition + (1 | id) + (1 | advert)
## Data: recall_df
## Df full model: 6
##                       Effect df     Chisq p.value
## 1                    version  1 26.50 ***   <.001
## 2         Training.condition  1   8.96 **    .003
## 3 version:Training.condition  1    4.51 *    .034
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1

False Discovery Rate Checks for p-value

##                  (Intercept)                     version1 
##                   0.94352386                   0.03110240 
##          Training.condition1 version1:Training.condition1 
##                   0.36639776                   0.03286222
##                  (Intercept)                     version1 
##                   0.94352386                   0.06572443 
##          Training.condition1 version1:Training.condition1 
##                   0.48853035                   0.06572443

Plotting: model predictions

## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.

Assumptions and comparative model fit

The graphs below check for normality in the residuals of the model. A straight line suggests normality. As can be seen, in the random effects model (right) there is a heavier tail towards the middle, suggesting some variability in the data is not captured by the model. These are not strict assumptions that need to be met in a model that uses a binomial distribution, but are included to provide a full picture of the models fit.

Simple residual plot

Supplementary analysis: Advert Level Variations

Did viewing a digital imprint with a piece of campaign material increase participants memory of the campaigner name, and how was the impacted by the aesthetic specifics of the advert?

To test this, each correct campaign name can be tested one by one with a logistical regression model to see if viewing the imprint boosted recall of the name. For this measure, there will therefore be 4 models and corresponding visualisations. This also uses the independent measures data frame with only one row per participant: training_df.

Correct names:

  • Advert 1: Common sense collective: CSC
  • Advert 2: Breaking barriers alliance: BBA
  • Advert 3: Speak freely inc: SFI
  • Advert 4: Campaign for a better Britain: CBB

This analysis also will tell us something about how the imprint impacted recall for each advert differently. If imprints only increase recall on some adverts and not others, this may be related to the formatting and aesthetic features of the post itself e.g., how obvious the campaign name was, or even how memorable the name was. To find evidence that supports digital imprints consistently increase recall, regardless of the formatting of the imprint, we should expect to consistently see higher recall across four campaign group names when an imprint is present.

Logs odds from the default model are converted to an odds ratio for easier interpretation. These are then presented as a table with the output of the regression. To understand the direction of the odds ratio, check the original log odds coefficient.

  CSC
Predictors Odds Ratios CI p
(Intercept) 1.58 1.35 – 1.85 <0.001
Advert 1 [1] 1.18 0.94 – 1.47 0.154
Observations 1322
R2 Tjur 0.002
AIC 1743.022
Recall of Common Sense Collective campaign group by imprint viewed
Coefficient SE Odds ratio 95% CI(lower) 95% CI(upper)
(Intercept) 0.46 0.08 1.58 1.35 1.85
Imprint viewed with material 0.16 0.11 1.18 0.94 1.47
  BBA
Predictors Odds Ratios CI p
(Intercept) 0.69 0.59 – 0.80 <0.001
Advert 2 [1] 1.55 1.24 – 1.92 <0.001
Observations 1322
R2 Tjur 0.012
AIC 1813.606
Recall of Breaking Barriers Alliance campaign group by imprint viewed
Coefficient SE Odds ratio 95% CI(lower) 95% CI(upper)
(Intercept) -0.37 0.08 0.69 0.59 0.80
Imprint viewed with material 0.44 0.11 1.55 1.24 1.92
  SFI
Predictors Odds Ratios CI p
(Intercept) 0.51 0.43 – 0.60 <0.001
Advert 3 [1] 1.39 1.11 – 1.74 0.004
Observations 1322
R2 Tjur 0.006
AIC 1747.146
Recall of Speak Freely Inc campaign group by imprint viewed
Coefficient SE Odds ratio 95% CI(lower) 95% CI(upper)
(Intercept) -0.67 0.08 0.51 0.43 0.60
Imprint viewed with material 0.33 0.11 1.39 1.11 1.74
  CBB
Predictors Odds Ratios CI p
(Intercept) 2.28 1.94 – 2.70 <0.001
Advert 4 [1] 1.20 0.95 – 1.52 0.135
Observations 1322
R2 Tjur 0.002
AIC 1584.110
Recall of common sense collective campaign group by imprint viewed
Coefficient SE Odds ratio 95% CI(lower) 95% CI(upper)
(Intercept) 0.83 0.08 2.28 1.94 2.70
Imprint viewed with material 0.18 0.12 1.20 0.95 1.52

Hypotheses 3a and 3b

Outcome: Perceived Informedness

Hypothesis 1c:

Did the presence of a digital imprint with a piece of campaign material increase participant’s perception that they had been informed about the source of the content?

  • Outcome: informed, numerical 1-7
  • Predictor: version, binary factor
  • Random effects: id and advert

Hypothesis 2b:

Were trained participants more likely to correctly identify that they were less informed when a digital imprint was not present, and more informed when a digital imprint was present, compared to the group who received no training?

  • Outcome: perceived informedness, numerical 1-7
  • fixed effect: training condition x version of imprint viewed (interaction effect)
  • random effects: id and advert

Model

## boundary (singular) fit: see help('isSingular')
##            df      AIC
## informed_1  7 19075.42
## informed_2  9 19074.43
## informed_3  7 19081.48
##                  (Intercept)          Training.condition1 
##                 0.0002220959                 0.2958726420 
##                     version1 Training.condition1:version1 
##                 0.0001851958                 0.0031290706

Model outcomes: table

  Perceived Subjective Informedness
Predictors Estimates std. Error CI p
Intercept 4.08 0.19 3.71 – 4.46 <0.001
Digital Imprint included -0.07 0.07 -0.20 – 0.06 0.296
Training Condition 0.21 0.05 0.11 – 0.31 <0.001
Digital Imprint x Training Condition 0.22 0.07 0.08 – 0.36 0.002
Random Effects
σ2 1.71
τ00 id 0.62
τ00 advert 0.14
ICC 0.31
N id 1322
N advert 4
Observations 5288
Marginal R2 / Conditional R2 0.011 / 0.313

Plotting: raw data

Plotting: Raw data

## `summarise()` has grouped output by 'Training.condition'. You can override
## using the `.groups` argument.
Decriptive bivariate statistics for percieved informedness
Training.condition version n mean_informed sd_informed se_informed ci_upper ci_lower
0 0 1318 4.08 1.56 0.04 4.17 4.00
0 1 1318 4.29 1.54 0.04 4.37 4.20
1 0 1326 4.01 1.60 0.04 4.10 3.92
1 1 1326 4.44 1.54 0.04 4.53 4.36

Plotting: Model predictions

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Model assumptions

The following assumptions are checked:

  • Normality of residuals
  • Variance of residuals
  • Normality of residuals within the random effects

There are no key assumption violations, supporting that the model predictions are reliable.

## 
##  Shapiro-Wilk normality test
## 
## data:  id_effects
## W = 0.99765, p-value = 0.05214
## 
##  Shapiro-Wilk normality test
## 
## data:  advert_effects
## W = 0.87633, p-value = 0.3232

Supplementary analysis: advert level variations on informedness

Is there evidence to suggest that it is the aesthetic style and content of an advert itself that increases informedness about a source, and do digital imprints play a role in informing citizens above and beyond this?

Claims tested:

  • Informedness about the source will be increased by the presence of a digital imprint, even when accounting for variations in campaign material content and format.

To further explore this, we can conduct an analysis comparing the effect of viewing each campaign post with and without the inclusion of a digital imprint on persuasion knowledge, political goal recognition, and perceived informedness.

## `summarise()` has grouped output by 'advert'. You can override using the
## `.groups` argument.
Decriptive bivariate statistics for percieved persuasion knowledge
advert version n mean_pk sd_pk se_pk ci_upper ci_lower
advert.1 0 665 5.40 0.95 0.04 5.47 5.33
advert.1 1 657 5.51 0.98 0.04 5.58 5.43
advert.2 0 657 4.91 1.00 0.04 4.99 4.84
advert.2 1 665 5.04 1.01 0.04 5.12 4.97
advert.3 0 662 4.42 1.01 0.04 4.50 4.35
advert.3 1 660 4.59 1.02 0.04 4.67 4.51
advert.4 0 660 4.47 0.94 0.04 4.54 4.40
advert.4 1 662 4.47 0.98 0.04 4.54 4.39

## `summarise()` has grouped output by 'advert'. You can override using the
## `.groups` argument.
Decriptive bivariate statistics for percieved political goal
advert version n mean_pg sd_pg se_pg ci_upper ci_lower
advert.1 0 665 6.44 0.72 0.03 6.49 6.38
advert.1 1 657 6.45 0.74 0.03 6.51 6.39
advert.2 0 657 6.35 0.86 0.03 6.41 6.28
advert.2 1 665 6.32 0.86 0.03 6.39 6.25
advert.3 0 662 5.24 1.45 0.06 5.35 5.13
advert.3 1 660 5.43 1.36 0.05 5.53 5.32
advert.4 0 660 6.01 1.02 0.04 6.08 5.93
advert.4 1 662 6.02 1.05 0.04 6.10 5.94

## `summarise()` has grouped output by 'advert'. You can override using the
## `.groups` argument.
Decriptive bivariate statistics for percieved informedness
advert version n mean_in sd_in se_in ci_upper ci_lower
advert.1 0 665 4.12 1.51 0.06 4.24 4.01
advert.1 1 657 4.44 1.57 0.06 4.56 4.32
advert.2 0 657 3.69 1.59 0.06 3.82 3.57
advert.2 1 665 4.12 1.48 0.06 4.24 4.01
advert.3 0 662 3.75 1.49 0.06 3.86 3.63
advert.3 1 660 4.11 1.53 0.06 4.23 3.99
advert.4 0 660 4.62 1.55 0.06 4.74 4.51
advert.4 1 662 4.79 1.49 0.06 4.90 4.67

Hypothesis 4a and 4b: outcome: credibility

Did those informed about the purpose of imprints use their absence/presence to evaluate the trustworthiness/credibility of the posts?

Validating the credible dimension

Before testing these hypotheses, the dimensionalty of the four measures designed to capture perceptions of credibility. Used is a confirmatory factor analysis to assess if these measures are a good fit assuming one dimension exists. The metrics used in assessment are as follows:

## This is lavaan 0.6-19
## lavaan is FREE software! Please report any bugs.
## 
## Attaching package: 'lavaan'
## The following object is masked from 'package:psych':
## 
##     cor2cov
Fit Measures for CFA Model with all 4 measures
Measure Value
chisq 255.9568300
df 2.0000000
pvalue 0.0000000
rmsea 0.1549599
cfi 0.9835789
tli 0.9507366
Standardised Factor Loadings
Latent Variable Indicator Standardised Loading
credible_factor trustworthy_value 0.8689097
credible_factor accurate_value 0.9219665
credible_factor believable_value 0.8959943
credible_factor factual_value 0.6870868

It can be seen that the factual-opinionated measure does not load well onto the dimension, which in turn jeprodises the model fit of the dimension below conventional levels of acceptability.

The tables below repeat the above process, checking for the validity of the credible dimension with only the trustworthy, accuracy and believable values.

Fit Measures for CFA Model with trustworthy, accuracy and believable measures
Measure Value
chisq 0
df 0
pvalue NA
rmsea 0
cfi 1
tli 1
Standardised Factor Loadings
Latent Variable Indicator Standardised Loading
credible_factor trustworthy_value 0.8564023
credible_factor accurate_value 0.9233735
credible_factor believable_value 0.9053528

From this assessment, it was decided the three measures of trustworthy, accuracy and believable would be combined into one ‘credibility’ factor, and the factual-opinionated measure would be tested seperately.

Models

  • Outcome: credible/factual, numerical 1-7
  • fixed effect: training condition x version of imprint viewed (interaction effect)
  • random effects: id and advert
  • control: agreement with post

Two models are created through the use of a function for each of the outcome variables, using the following forumula:

model <- lmer(outcome ~ Training.condition + version + agree_value + Training.condition*version + (1|id) + (1|advert))

Tables

  Perceived credibility
Predictors Estimates CI p
Intercept 1.04 0.83 – 1.24 <0.001
Training -0.02 -0.09 – 0.05 0.629
Digital Imprint included 0.06 0.00 – 0.11 0.046
Agreement 0.68 0.66 – 0.69 <0.001
Training*imprint 0.03 -0.05 – 0.11 0.452
Random Effects
σ2 0.55
τ00 id 0.16
τ00 advert 0.04
ICC 0.27
N id 1322
N advert 4
Observations 5288
Marginal R2 / Conditional R2 0.583 / 0.694
  Perceived fact versus opinion
Predictors Estimates CI p
Intercept 0.69 0.29 – 1.10 0.001
Training -0.01 -0.14 – 0.12 0.849
Digital Imprint included 0.02 -0.07 – 0.12 0.603
Agreement 0.56 0.53 – 0.59 <0.001
Training*imprint 0.06 -0.07 – 0.19 0.343
Random Effects
σ2 1.48
τ00 id 0.70
τ00 advert 0.14
ICC 0.36
N id 1322
N advert 4
Observations 5288
Marginal R2 / Conditional R2 0.238 / 0.514

Plotting: raw data

Below shows the raw data distributions split by both experimental manipulations (training and version) on the two key outcomes of interest.

Plotting: model predictions

## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.

## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.

Assumptions

See under ‘details’ to evaluate key model assumptions. It is noted the large sample size is considered to mitigate many of the issues that could arise with violated assumptions. However, these are visualised for full transparency.

Each model is checked for the following assumptions:

  • normality of residuals
  • equal variance of residuals
  • normal distribution of residuals for the random effects

Credible model:

Shapiro-Wilk Test Results for Normality
Variable W_Statistic P_Value
id_effects 0.987 0.000
advert_effects 0.963 0.796
Note:
A p-value > 0.05 indicates the data is likely to be normally distributed. A p-value ≤ 0.05 suggests the data deviates from normality.

Factual model:

Shapiro-Wilk Test Results for Normality
Variable W_Statistic P_Value
id_effects 0.989 0.000
advert_effects 0.792 0.088
Note:
A p-value > 0.05 indicates the data is likely to be normally distributed. A p-value ≤ 0.05 suggests the data deviates from normality.

Hypothesis 4: outcome: confidence in regulation

Does being informed explicitly about the purpose of digital imprints and their relation to regulatory compliance increase perceptions that political advertising is sufficiently regulated in the UK?

This model uses the training_df dataframe.

  • Outcome: confidence in regulation, numerical 1-7
  • Predictor: training condition

Assumptions of normality and equal variance of residuals are violated, due to the skewed distribution of the outcome variable. A robust standard errors model is fitted as a robustness check, and the same result is found, increasing confidence in the reliability of this estimate.

Model

Table

The top table shows the outcome using a traditional linear regression. The bottom table shows a robust regression model, weights generated using an M estimator with the ‘rlm’ function in the MASS package (loaded at this point in the script). The latter robustness check is included due to the violation of assumptions seen in the original model.

  Perceived sufficiency of advertising regulation
Predictors Estimates std. Error CI p
Intercept 3.01 0.06 2.90 – 3.12 <0.001
Trained -0.11 0.08 -0.27 – 0.04 0.150
Observations 1322
R2 / R2 adjusted 0.002 / 0.001
  Perceived sufficiency of advertising regulation (robust)
Predictors Estimates std. Error CI p
Intercept 2.88 0.06 2.76 – 3.01 <0.001
Trained -0.11 0.09 -0.29 – 0.07 0.215
Observations 1322

Plotting: raw data

Plotting: model predictions

Model assumptions

Exploratory mediation model

mediate_PK <- '
  # Latent variable for credibility
  credibility =~ trustworthy_value + accurate_value + believable_value
  
  # Direct effect
  credibility ~ c*version
  
  # Indirect effect (via PK)
  PK ~ a*version
  credibility ~ b*PK
  
  # Indirect effect
  indirect := a*b
  
  # Total effect
  total := c + (a*b)
'

fit <- sem(mediate_PK, data = imprint_df)

summary(fit, standardized = TRUE, fit.measures = TRUE)
## lavaan 0.6-19 ended normally after 25 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        10
## 
##   Number of observations                          5288
## 
## Model Test User Model:
##                                                       
##   Test statistic                                89.648
##   Degrees of freedom                                 4
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                             12552.451
##   Degrees of freedom                                10
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.993
##   Tucker-Lewis Index (TLI)                       0.983
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -30526.111
##   Loglikelihood unrestricted model (H1)     -30481.287
##                                                       
##   Akaike (AIC)                               61072.222
##   Bayesian (BIC)                             61137.954
##   Sample-size adjusted Bayesian (SABIC)      61106.177
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.064
##   90 Percent confidence interval - lower         0.053
##   90 Percent confidence interval - upper         0.075
##   P-value H_0: RMSEA <= 0.050                    0.022
##   P-value H_0: RMSEA >= 0.080                    0.010
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.015
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   credibility =~                                                        
##     trustworthy_vl    1.000                               1.276    0.858
##     accurate_value    1.051    0.012   90.106    0.000    1.342    0.921
##     believable_val    1.101    0.012   88.439    0.000    1.405    0.906
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   credibility ~                                                         
##     version    (c)    0.068    0.036    1.913    0.056    0.053    0.027
##   PK ~                                                                  
##     version    (a)    0.099    0.029    3.383    0.001    0.099    0.046
##   credibility ~                                                         
##     PK         (b)   -0.274    0.017  -16.275    0.000   -0.214   -0.229
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .trustworthy_vl    0.582    0.015   39.423    0.000    0.582    0.263
##    .accurate_value    0.323    0.012   26.515    0.000    0.323    0.152
##    .believable_val    0.428    0.014   30.270    0.000    0.428    0.178
##    .PK                1.134    0.022   51.420    0.000    1.134    0.998
##    .credibility       1.543    0.040   38.189    0.000    0.948    0.948
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     indirect         -0.027    0.008   -3.312    0.001   -0.021   -0.011
##     total             0.041    0.036    1.123    0.261    0.032    0.016
#plot

lavaanPlot(model = fit, 
           coefs = TRUE,           # Show coefficients
           stand = TRUE,           # Standardized coefficients
           stars = "regressions")
#perceived informedness

mediate_informed <- '
  # Latent variable for credibility
  credibility =~ trustworthy_value + accurate_value + believable_value
  
  # Direct effect
  credibility ~ c*version
  
  # Indirect effect (via PK)
  informed ~ a*version
  credibility ~ b*informed
  
  # Indirect effect
  indirect := a*b
  
  # Total effect
  total := c + (a*b)
'

fit <- sem(mediate_informed, data = imprint_df)

summary(fit, standardized = TRUE, fit.measures = TRUE)
## lavaan 0.6-19 ended normally after 25 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        10
## 
##   Number of observations                          5288
## 
## Model Test User Model:
##                                                       
##   Test statistic                               146.950
##   Degrees of freedom                                 4
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                             12575.862
##   Degrees of freedom                                10
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.989
##   Tucker-Lewis Index (TLI)                       0.972
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -32584.118
##   Loglikelihood unrestricted model (H1)     -32510.643
##                                                       
##   Akaike (AIC)                               65188.236
##   Bayesian (BIC)                             65253.968
##   Sample-size adjusted Bayesian (SABIC)      65222.191
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.082
##   90 Percent confidence interval - lower         0.071
##   90 Percent confidence interval - upper         0.094
##   P-value H_0: RMSEA <= 0.050                    0.000
##   P-value H_0: RMSEA >= 0.080                    0.641
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.021
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   credibility =~                                                        
##     trustworthy_vl    1.000                               1.276    0.859
##     accurate_value    1.054    0.012   90.343    0.000    1.345    0.923
##     believable_val    1.098    0.012   88.170    0.000    1.401    0.904
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   credibility ~                                                         
##     version    (c)   -0.009    0.036   -0.250    0.802   -0.007   -0.004
##   informed ~                                                            
##     version    (a)    0.317    0.043    7.397    0.000    0.317    0.101
##   credibility ~                                                         
##     informed   (b)    0.158    0.012   13.658    0.000    0.123    0.194
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .trustworthy_vl    0.581    0.015   39.375    0.000    0.581    0.263
##    .accurate_value    0.314    0.012   25.873    0.000    0.314    0.148
##    .believable_val    0.439    0.014   30.790    0.000    0.439    0.183
##    .informed          2.435    0.047   51.420    0.000    2.435    0.990
##    .credibility       1.568    0.041   38.227    0.000    0.963    0.963
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     indirect          0.050    0.008    6.504    0.000    0.039    0.020
##     total             0.041    0.036    1.126    0.260    0.032    0.016
#plot

lavaanPlot(model = fit, 
           coefs = TRUE,           # Show coefficients
           stand = TRUE,           # Standardized coefficients
           stars = "regressions")